Search CORE

141,329 research outputs found

Exact Decoding on Latent Variable Conditional Models is NP-Hard

Author: Sun Xu
Publication venue
Publication date: 18/06/2014
Field of study

Latent variable conditional models, including the latent conditional random fields as a special case, are popular models for many natural language processing and vision processing tasks. The computational complexity of the exact decoding/inference in latent conditional random fields is unclear. In this paper, we try to clarify the computational complexity of the exact decoding. We analyze the complexity and demonstrate that it is an NP-hard problem even on a sequential labeling setting. Furthermore, we propose the latent-dynamic inference (LDI-Naive) method and its bounded version (LDI-Bounded), which are able to perform exact-inference or almost-exact-inference by using top-

n

search and dynamic programming

arXiv.org e-Print Archive

Hybrid Oracle: Making Use of Ambiguity in Transition-based Chinese Dependency Parsing

Author: Ren Xuancheng
Sun Xu
Publication venue
Publication date: 05/02/2018
Field of study

In the training of transition-based dependency parsers, an oracle is used to predict a transition sequence for a sentence and its gold tree. However, the transition system may exhibit ambiguity, that is, there can be multiple correct transition sequences that form the gold tree. We propose to make use of the property in the training of neural dependency parsers, and present the Hybrid Oracle. The new oracle gives all the correct transitions for a parsing state, which are used in the cross entropy loss function to provide better supervisory signal. It is also used to generate different transition sequences for a sentence to better explore the training data and improve the generalization ability of the parser. Evaluations show that the parsers trained using the hybrid oracle outperform the parsers using the traditional oracle in Chinese dependency parsing. We provide analysis from a linguistic view. The code is available at https://github.com/lancopku/nndep

arXiv.org e-Print Archive

The largest singletons in weighted set partitions and its applications

Author: Sun Yidong
Xu Yanjie
Publication venue
Publication date: 08/07/2010
Field of study

Recently, Deutsch and Elizalde studied the largest and the smallest fixed points of permutations. Motivated by their work, we consider the analogous problems in weighted set partitions. Let

A_{n,k}(\mathbf{t})

denote the total weight of partitions on

[n+1]

with the largest singleton

\{k+1\}

. In this paper, explicit formulas for

A_{n,k}(\mathbf{t})

and many combinatorial identities involving

A_{n,k}(\mathbf{t})

are obtained by umbral operators and combinatorial methods. As applications, we investigate three special cases such as permutations, involutions and labeled forests. Particularly in the permutation case, we derive a surprising identity analogous to the Riordan identity related to tree enumerations, namely, \begin{eqnarray*} \sum_{k=0}^{n}\binom{n}{k}D_{k+1}(n+1)^{n-k} &=& n^{n+1}, \end{eqnarray*} where

D_{k}

is the

k

-th derangement number or the number of permutations of

\{1,2,\dots, k\}

with no fixed points.Comment: 15page

arXiv.org e-Print Archive

A Generic Online Parallel Learning Framework for Large Margin Models

Author: Ma Shuming
Sun Xu
Publication venue
Publication date: 02/03/2017
Field of study

To speed up the training process, many existing systems use parallel technology for online learning algorithms. However, most research mainly focus on stochastic gradient descent (SGD) instead of other algorithms. We propose a generic online parallel learning framework for large margin models, and also analyze our framework on popular large margin algorithms, including MIRA and Structured Perceptron. Our framework is lock-free and easy to implement on existing systems. Experiments show that systems with our framework can gain near linear speed up by increasing running threads, and with no loss in accuracy

arXiv.org e-Print Archive

Lock-Free Parallel Perceptron for Graph-based Dependency Parsing

Author: Ma Shuming
Sun Xu
Publication venue
Publication date: 02/03/2017
Field of study

Dependency parsing is an important NLP task. A popular approach for dependency parsing is structured perceptron. Still, graph-based dependency parsing has the time complexity of

O(n^3)

, and it suffers from slow training. To deal with this problem, we propose a parallel algorithm called parallel perceptron. The parallel algorithm can make full use of a multi-core computer which saves a lot of training time. Based on experiments we observe that dependency parsing with parallel perceptron can achieve 8-fold faster training speed than traditional structured perceptron methods when using 10 threads, and with no loss at all in accuracy

arXiv.org e-Print Archive

A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification

Author: Ma Shuming
Sun Xu
Publication venue
Publication date: 06/10/2017
Field of study

Text summarization and text simplification are two major ways to simplify the text for poor readers, including children, non-native speakers, and the functionally illiterate. Text summarization is to produce a brief summary of the main ideas of the text, while text simplification aims to reduce the linguistic complexity of the text and retain the original meaning. Recently, most approaches for text summarization and text simplification are based on the sequence-to-sequence model, which achieves much success in many text generation tasks. However, although the generated simplified texts are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and simplified texts for text summarization and text simplification. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms the state-of-the-art systems on two benchmark corpus

arXiv.org e-Print Archive

ON cyclotomic elements and cyclotomic subgroups in K_{2} of a field

Author: Sun Chaochao
Xu Kejian
Publication venue
Publication date: 26/01/2015
Field of study

The problem of expressing an element of K_2(F) in a more explicit form gives rise to many works. To avoid a restrictive condition in a work of Tate, Browkin considered cyclotomic elements as the candidate for the element with an explicit form. In this paper, we modify and change Browkin's conjecture about cyclotomic elements into more precise forms, in particular we introduce the conception of cyclotomic subgroup. In the rational function field cases, we determine completely the exact numbers of cyclotomic elements and cyclotomic subgroups contained in a subgroup generated by finitely many different cyclotomic elements, while in the number field cases, using Faltings' theorem on Mordell conjecture we prove that there exist subgroups generated by an infinite number of cyclotomic elements to the power of some prime, which contain no nontrivial cyclotomic elements

arXiv.org e-Print Archive

Markov Chain Block Coordinate Descent

Author: Sun Tao
Sun Yuejiao
Xu Yangyang
Yin Wotao
Publication venue
Publication date: 21/11/2018
Field of study

The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications

arXiv.org e-Print Archive

A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction

Author: Sun Xu
Zhang Yi
Publication venue
Publication date: 18/12/2017
Field of study

Abbreviation is a common phenomenon across languages, especially in Chinese. In most cases, if an expression can be abbreviated, its abbreviation is used more often than its fully expanded forms, since people tend to convey information in a most concise way. For various language processing tasks, abbreviation is an obstacle to improving the performance, as the textual form of an abbreviation does not express useful information, unless it's expanded to the full form. Abbreviation prediction means associating the fully expanded forms with their abbreviations. However, due to the deficiency in the abbreviation corpora, such a task is limited in current studies, especially considering general abbreviation prediction should also include those full form expressions that do not have valid abbreviations, namely the negative full forms (NFFs). Corpora incorporating negative full forms for general abbreviation prediction are few in number. In order to promote the research in this area, we build a dataset for general Chinese abbreviation prediction, which needs a few preprocessing steps, and evaluate several different models on the built dataset. The dataset is available at https://github.com/lancopku/Chinese-abbreviation-datase

arXiv.org e-Print Archive

Learning Sentiment Memories for Sentiment Modification without Parallel Data

Author: Sun Xu
Xu Jingjing
Yang Pengcheng
Zhang Yi
Publication venue
Publication date: 22/08/2018
Field of study

The task of sentiment modification requires reversing the sentiment of the input and preserving the sentiment-independent content. However, aligned sentences with the same content but different sentiments are usually unavailable. Due to the lack of such parallel data, it is hard to extract sentiment independent content and reverse the sentiment in an unsupervised way. Previous work usually can not reconcile sentiment transformation and content preservation. In this paper, motivated by the fact the non-emotional context (e.g., "staff") provides strong cues for the occurrence of emotional words (e.g., "friendly"), we propose a novel method that automatically extracts appropriate sentiment information from learned sentiment memories according to specific context. Experiments show that our method substantially improves the content preservation degree and achieves the state-of-the-art performance.Comment: Accepted by EMNLP 201

arXiv.org e-Print Archive